DIGEST Version 1.0 by Ramin C. Nakisa Usage ~~~~~ DIGEST [sequence filename] Example ~~~~~~~ To cut the sequence contained in the file myseq.dna, C:\>digest DIGEST of what sequence ? myseq.dna SEQUENCE ~~~~~~~~ ID : myseq Length : 5207 Format : GCG Begin ( * 1 * ): End ( * 5207 * ): Number of enzymes: 357 * to select all enzymes. individual names like AluI to select specific enzymes. ? to see this message. ?? to see the available enzymes AND their recognition sites. /* to see what enzymes you have selected so far. # to start cutting! Enzyme: bsri BsrI FOUND!!! Enzyme: # What should I call the output file ? frags.txt Digesting... ²²²²²²²²²²²°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°° C:\>type frags.txt What Does it Do? ~~~~~~~~~~~~~~~~ DIGEST scans DNA sequence files for restriction sites. It prompts the user to specify which enzymes to cut with, and if they are in the enzyme database file WISCONSI.920 it writes out the positions of all the cuts and sorts the fragments by size. It uses Don Gilbert's spankingly good sequence reading module (UREADSEQ) so it can understand most of the major sequence formats, namely 1. IG/Stanford 10. Olsen (in-only) 2. GenBank/GB 11. Phylip3.2 3. NBRF 12. Phylip 4. EMBL 13. Plain/Raw 5. GCG 14. PIR/CODATA 6. DNAStrider 15. MSF 7. Fitch 16. ASN.1 8. Pearson/Fasta 17. PAUP 9. Zuker 18. Pretty (out-only) The program has been written to resemble the GCG MAPSORT program as closely as possible, both in terms of user input and program output. I thought this would make the program more user-friendly for molecular biologists who have probably been weaned on GCG. Output files look like (Linear) DIGEST of: myseq.dna from: 1 to: 5207 BsrI ACTG_Gn' Cuts at: 0 130 1434 1447 1555 1961 2079 2122 2395 Size: 130 1304 13 108 406 118 43 273 Cuts at: 2395 2561 3020 3114 3460 3669 4446 5207 Size: 166 459 94 346 209 777 761 Fragments arranged by size: 1304 777 761 459 406 346 273 209 166 130 118 108 94 43 13 Enzymes that do cut: AciI ApaLI BsrI SphI Enzymes that do not cut: NruI How to Modify WISCONSI.920 ~~~~~~~~~~~~~~~~~~~~~~~~~~ If you are looking for a particular DNA motif, or have just discovered a new restriction enzyme in a bug from a volcanic spring (or a New England Biolabs catalogue) then adding a site to the database file is REALLY easy. Just remember to use a text editor, or if using a WP remember to export the file in ASCII format. Here's what an entry looks like: EarI 7 CTCTTCn'nnn_ 3 ! Eam1104I,Ksp632I >NU Each enzyme is on a separate line. The individual fields are i) Enzyme (or motif) name. ii) Cut offset from first base of recognition site. iii) Overhang length. iv) An obligatory exclamation mark. v) Isoschizomers. vi) An obligatory greater than sign. vii) A list of commercial sources for the enzyme. DIGEST ignores everything past the exclamation mark, so you can skip that bit if you like. The program understands the IUPAC codes for base pair ambiguity Symbol Meaning ------ ------- A Adenine G Guanine C Cytosine T Thymine U Uracil Y pYrimidine (C or T) R puRine (A or G) W "Weak" (A or T) S "Strong" (C or G) K "Keto" (T or G) M aMino (C or A) B not A (C or G or T) D not C (A or G or T) H not G (A or C or T) V not T (A or C or G) N unknown (A or C or G or T) Grovelling Credits Section ~~~~~~~~~~~~~~~~~~~~~~~~~~ As new restriction enzymes are discovered the WISCONSI.920 database will become out of date. You may then like to ftp a new version from one of the molbio server sites. Rich Roberts keeps the information in many formats, so make sure the one you get is of the GCG variety, as described above. The version of the database distributed with this version of DIGEST is 9206 (May 29 1992). + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Dr. Richard J. Roberts Restriction Enzyme Database Copyright (c) Cold Spring Harbor Laboratory 1992 All rights reserved. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + I think Dan Gilbert is a marvellous man. UREADSEQ is FAB. In case you ever read this, Dan, next time you're in London drop in to Imperial and I'll buy you a pint of Old Rosie at the Phoenix and Firkin. Here is the header from UREADSEQ.C: * ReadSeq -- 30 Dec 92 * * Reads and writes nucleic/protein sequences in various * formats. Data files may have multiple sequences. * * Copyright 1990 by d.g.gilbert * biology dept., indiana university, bloomington, in 47405 * e-mail: gilbertd@bio.indiana.edu * * This program may be freely copied and used by anyone. * Developers are encourged to incorporate parts in their * programs, rather than devise their own private sequence * format. * * This should compile and run with any ANSI C compiler. * Please advise me of any bugs, additions or corrections. Desperate Plea for Recognition ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If you enjoyed using DIGEST, please DON'T SEND ME ANY MONEY! I don't want money. If I did I wouldn't have started a PhD. I want PRAISE! RECOGNITION! FAME! PRAISE (again)! Please send your flattering minutiae, ego boosters, gripes and suggested improvements by EMAIL to ramin@ic.ac.uk ................ for Internet people Alternatively, SNAILMAIL: Ramin Nakisa, Biophysics Section, The Blackett Laboratory, Imperial College of Science, Technology and Medicine, Prince Consort Road, London SW7 2BZ Great Britain. Tel: 071 589-5111 x 6729 FAX: 071 589-0191